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ABSTRACT 

Although it is widely understood that pulsar timing observations generally contain 
time-correlated stochastic signals (TCSSs; red timing noise is of this type), most data 
analysis techniques that have been developed make an assumption that the stochastic 
uncertainties in the data are uncorrelated, i.e. "white" . Recent work has pointed out 
that this can introduce severe bias in determination of timing- model parameters, and 
that better analysis methods should be used. This paper presents a detailed investiga- 
tion of timing-model fitting in the presence of TCSSs, and gives closed expressions for 
the post-fit signals in the data. This results in a Bayesian technique to obtain timing- 
model parameter estimates in the presence of TCSSs, as well as computationally more 
efficient expressions of their marginalised posterior distribution. A new method to 
analyse hundreds of mock dataset realisations simultaneously without significant com- 
putational overhead is presented, as well as a statistically rigorous method to check the 
internal consistency of the results. As a by-product of the analysis, closed expressions 
of the rms introduced by a stochastic background of gravitational-waves in timing- 
residuals are obtained. Using T as the length of the dataset, and ft-cllyr^^) as the 
characteristic strain, this is: cr^^B = /ic(lyr"^)^(9v^2^r(-10/3)/8008)yr"''/3Tio/3_ 
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1 INTRODUCTION 

Over the years, pulsar timing has proved to be a useful 
tool for probing a wide range of science. Prime examples in- 
clude the confirmation of t he emission of gravitational waves 
IXavlor fc Weisbergj|l982l'). an d ver y accurate tests of gen- 
eral relativity ( Kramer et al.l |2006| ) . A n overview of tech- 
niques used in pulsar timing is given in lLorimer fc Kramed 
(|2005l ). and a detailed description of current treatment of 
timing data is given in t he Tempo2 papers (|Hobbs et al.l 
l2006l : lEdwards et al.|[2006l 'l. Much of the interesting science 
results from the fact that accurate measurements of the 
times of arrival (TOAs) of the radio pulses allow one to 
precisely track the pulsar trajectories relative to the Earth, 
and and that the observed TOAs can be accounted for very 
precisely by building a physical model of pulsar trajectory, 
pulse propagation, and pulsar spin evolution in relativistic 
gravity. Such a model is referred to as the timing-model; 
mathematically the parameters of the timing-model are de- 
termined by the minimisation for the TO A fit. 

The remaining differences between the TOAs and the 
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timing-model are called timing-residuals (TRs). The physics 
beyond that included in the timing-model is contained in 
TRs. Of particular interest are the time-correlated stochas- 
tic signals (TCSSs), examples of which include: 

1) the so-called pulsar timing noise, or "red noise", a 
generic term for random changes in the pulsar rotational 
rate, thought to be possibly due to the random angular- 
momentum exchange between the normal and superfiuid 
components, 

2) the time-dependent influence of the interstellar medium 
on the optical pathlength between the pulsar and the Earth, 
and 

3) the influence of the stochastic background of gravitational 
waves (GWB) on the pulse TOAs. 

For a r ecent discussion of aU of th ese, see lCordes fc Shannon! 
(|2O10l ): [shannon fc Corded (|2010l ). 

All of the above processes feature a red spectrum, and 
their contributions to the TOAs are difflcult to disentangle 
from the variation of the deterministic timing-model param- 
eters. The purpose of this paper is to develop a rigorous and 
efficient procedure for a TOA analysis which includes simul- 
taneously the timing- model and a TCSS. The two specific 
questions which are explicitly addressed are: 
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1) How does the timing-model fitting affect the statistical 
properties of TCSS-induced timing-residuals? 

2) Conversely, how does the presence of a TCSS affect the 
uncertainties of the timing-model parameters? 

Our method, as well as our answers to the above questions, 
are extensively tested on mock data. 

The detailed plan of the paper is as follows In Section[2] 
we develop a formalism that casts fitting of the timing-model 
in the presence of red noise as a non-orthogonal projection of 
the covariance matrix. A connectio n with least-squar es fit- 
ting methods is made (as used by e.g. lColes et al.|[201ll . here- 
after CHCMV). This leads to understanding of degeneracies 
between the covariance matrix and the timing-model. We 
show how to exploit these degeneracies in Section |3l which 
results in improved expressions for the covariance function 
and the marginalised posterior distribution. We describe 
how to obtain estimates for the timing-model parameters 
in Section 3] In Section [S] we introduce a computationally 
efficient method to analyse hundreds of mock datasets si- 
multaneously, and we describe a powerful test based on the 
Kolmogorov-Smirnov statistic that we use to check that the 
Bayesian analysis method produces consistent results. Fi- 
nally, we compare our results with the recently proposed 
Cholesky method of CHCMV in Section [1 



the (n X m) matrix M is the so-called design matrix (s ee e.g. 
§15.4 of IPress et allll992l : \7an Haasteren et al.ll2009l . here- 
after vHLML), which describes how the timing-residuals de- 
pend on the model parameters. Without loss of generality, 
we assume here and in subsequent sections that M has been 
constructed such that its columns are linearly independent. 
Although the distinction between deterministic and TCSS 
contributions here seems analogous to Equation ([1]), we note 
here that there is significant absorption of TCSSs in the fit. 

The deterministic signals in t'^"^ are well modelled 
in standard pulsa r timing packages like e.g., Tempo2 
iHobbs et al.ll2006l '). We model the TCSS contributions as a 
random Gaussian process with a covariance matrix defined 
by: 



(3) 



where the brackets (...) denote the ensemble average over 
many realisations of the random process, and the indices i 
and j run from 1 to n. The covariance matrix is the numeri- 
cal representation of the covariance function, and we assume 
here that it can be parametrised with parameters 0. We use 
the following convention for the Wiener-Khinchin theorem 
to relate the covariance function to the spectral density: 



2 TIMING-MODEL FITS AND THE 
COVARIANCE FUNCTION 

The observed TOAs of every pulsar contain contributions 
from many deterministic and stochastic processes. The tra- 
ditional procedure ignores any stochastic or unknown deter- 
ministic contributions to the timing-residuals ( e.g. the stan- 
dard weighted least-squares fit in Tempo2, see iHobbs et al.l 
l2006l ). Therefore, some of these get absorbed into the timing- 
model fits, which also alters the estimates of the timing- 
model parameters. In this section we show how to deal with 
TCSSs in combination with fitting to the timing-model. 

2.1 The covariance function and least-squares 
fitting 

We describe the n TOAs of a single pulsar as an addition of 
a deterministic and a stochastic part: 

T"" = -f 57*"^ (1) 

where the n elements of t are the observed TOAs, t'^^* are 
the deterministic contributions to the TOAs, and St are 
the stochastic contributions to the TOAs, which in this 
work are TCSSs all modelled by a random Gaussian pro- 
cess. In practice, the pre-fit timing residuals are produced 
with first estimates /3oi of the m timing-model parameters 
/3i; this initial guess is usually precise enough so that a 
linear a pproximation of the timing-model can be used fur- 
ther on (|Edwards et al.ir2006l ) . Namely, it is a good assump- 
tion that the remaining timing-residuals depend linearly on 

= /3a — Poa- 



prf 



St = St + M£_ 



(2) 



where St are the timing-residuals in the linear approxima- 
tion to the timing-model, St^ is the vector of pre-fit timing- 
residuals, ^ is the vector with timing-model parameters, and 



C{t) = / 5'(/) cos(r/) df , 



(4) 



where S{f) is the spectral density of St as a function 
of frequency, and r = 27r|ti — t2\ is the time difference 
between two observations. We emphasise here that, unlike 
some methods proposed in the literature (e.g. CHCMV), we 
do not model the post-fit timing-residuals, but the timing- 
residuals St prior to the fitting process. 

The Bayesian likelihood of the timing-residuals is given 
by (vHLML): 



P{St\iA) = 



^(27r)"detC 

exp [st - Kltf C'^ [st - Mi 



(5) 



Provided we know the value of the parameters (j> prior to the 
analysis, i.e. provided we know the covariance matrix C, we 
can maximise the likelihood with respect to the model pa- 
rameters. This results in the generalised least-squares (GLS) 
estimator for the timing-model parameters: 



St 



prf 



St"^ 



= ii„-B)sr°' = osr°\ 



(6) 



where are the best-fit timing-model parameters, I„ is the 

71-dimensional identity matrix, St^ are the post-fit resid- 
uals, and O and B are the matrices that represent the 
"removal" of the timing-model. The Cholesky method of 
CHCMV uses this GSL estimator, combined with an esti- 
mate for the covariance matrix C. In pulsar timing we gen- 
erally do not know the values of (j) prior to the analysis. 
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2.2 Effect of fitting on tiie covariance function 

Irrespective of what technique is used to produce the best-fit 
timing-model parameters and the post-fit timing-residuals, 
the resulting post-fit timing-residuals are not correlated ac- 
cording to Equation @. In order to compute the exact ef- 
fect that fitting has on the post-fit correlations, we take the 
following approach. Lets assume that we have timing resid- 
ual vectors of length n, defined on an interval [—T, T] , and 
some deterministic process defined by m parameters repre- 
sented by ^. The effect of the deterministic process on the 
timing-residuals is given by M^. Typically, a fitting proce- 
dure removes the contributions of the parameters ^ to the 
timing-residuals with respect to some inner product. We de- 
fine the inner product on the vector space of timing-residuals 
as: 



y, 



(7) 



where x and y are vectors of timing-residuals, i5 is a positive 
definite symmetric (PDS) matrix, and {...,.. .)e indicates 
an inner product with PDS matrix E. Consider a fitting pro- 
cedure that produces post-fit timing-residuals that satisfy: 
{x, M^) _B = 0. It is straightforward to check that this is the 
weighted least-squares fit when Eij = Sijaf. When E is a 
more general PDS matrix, the fitting process that removes 
all from the timing-residuals is the GLS of Equation (0). 
For our purposes, the exact form of E is not relevant. 

We use the expressions for O and B as in Equa- 
tion ((61, but now with covariance function E: B — 
M {M'^E-^My'^ M'^E-\ Both O and B are non- 
orthogonal projection matrices: B — B^, B^ 7^ B and like- 
wise for O. The correlations in post-fit timing-residuals due 
to a rand om Gaussian process with covariance matrix C is 
given by (|Demorest et al.ll2012l ): 



5ti Stj \ 



OSti 



pof 



OSt 



pof 



oco' 



(8) 

The correlations in the post-fit timing-residuals are thus 
given by the covariance matrix of the random process C, 
projected with the matrix O, where O removes any con- 
tribution A/^ with respect to the inner product of Equa- 
tion (Q. From here onwards, we omit the superscript "pof" 
for St, and by default we assume we are dealing with post-fit 
timing-residuals, with respect to some inner-product. 



3 EXPLOITING FITTING DEGENERACIES 

Because the fitting process is effectively a projection of the 
timing residuals, the covariance matrix that describes the 
post-fit residuals is also a projection of the pre-fit covariance 
matrix. This post-fit covariance matrix is therefore singular, 
and the pre-fit covariance matrix cannot be reconstructed 
from the post-fit covariance matrix alone: there is a degen- 
eracy in the processes that could have generated a single re- 
alisation of post-fit timing-residuals. In this section we use 
this degeneracy to derive closed expressions for the post- 
fit covariance function, and more computationally efficient 
expressions for the marginalised posterior distribution. 



3.1 The post-fit covariance function 

In Equation ((8| we used the non-orthogonal projection ma- 
trices O and B to remove any contribution to the timing- 
residuals with respect to the inner product of Equation ((T)). 
Now consider two related projection matrices: 



D = M M 



W 



D. 



(9) 



Both D and W are orthogonal projections, which satisfy 
= W and W = W^, and likewise for D. The relation 
with B and O is intuitive: if the covariance matrix E of 
the inner product of Equation ((Tjl is the identity matrix, 
then we have W — O, and D — B. These four projection 
matrices have the following interesting properties: BD = D 
and DB — B, and similar expressions for W and O. From 
this it follows that: 



( StSt ) = OCO' = OWCW' O' 



(10) 



where the square matrix SiSt is the dyadic product of two 
vectors. This expression shows that the degeneracy in the 
covariance function of the post-fit timing-residuals allows 
us to equivalently use WCW'^ instead of C. In Section [331 
we derive analytical expression for this post-fit covariance 
function, see Equation p8|l . 



3.2 A computationally more efficient marginalised 
posterior 

As shown in vHLML, it is possible to analytically 
marginalise the posterior distribution when a fiat prior 
distribution is assumed for the linear parameters (in Ap- 
pendix |B] we show how to include Gaussian priors). The 
marginalised posterior distribution is then equal to the like- 
lihood function of Equation ([5]) integrated over the linear 
parameters ^: 



det [MTC-^M)-^ 

, — X 

^(27r)"~'"detC 

exp f^St^C'Stj , (11) 



with: 



C' ^C'^ -C~^M (^M'^C'^m'J ^ M'^C^\ (12) 

We now simplify this equation by re-expressing the effect of 
the linear parameters ^ on the timing-residuals in terms of 
an orthonormal basis. To this end we factorise the matrix 
M with a singular value decomposition: 



M = C/EV* 



(13) 



where U and V are respectively (n x n) and (m x m) orthog- 
onal matrices, and E is an (n x m) diagonal matrix. For our 
purposes, the column space of the orthogonal matrix U is 
important. The first m columns of U span the column space 
of M, and the last n — m columns of U span the comple- 
ment. We now construct the matrices F and G as follows. 
U = (F G) , where F is the (n x m) matrix consisting 
of the first m columns of U, and G is the (n x (n — m)) 
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matrix consisting of the other columns of U. The following 
identities hold: 



F^F 
G'^G 

ff'^ + gg'^ 



(14) 



D + W 



Using these expressions, it is now possible to show that the 
marginalised likelihood of Equation (lllf) is equal tcQ: 



^(27r)"-'"det (G^CG) 
exp 



(15) 



-ft^G (g'^CG] ^G'^St 



More intuitively, the marginalised likelihood is the likelihood 
function of an (n — m)-dimensional random Gaussian pro- 
cess of the data G^St, with a covariance function G^GG. 
Because G only needs to be calculated once. Equation (|15p 
is an improvement over Equation (|11|) . Since G^CG is a 
PDS matrix. Equation (|15p is best evaluated by using the 
Cholesky decomposition, which also directly gives access to 
the value of the required determinant. 



3.3 Analytic post-fit covariance functions 

In the previous section, we have shown that we are allowed to 
use G^ = WGW^ instead of G in all our equations that de- 
scribe post-fit timing-residuals. We analytically approximate 
that quantity by using the inner product of Equation ^ , 



{x,y)E = J2 



a^At 



x{t)yit) dt, (16) 



where we have used Ei. 



jff^, with a the uncertainty 



of the TOAs, At is the time interval between observations, 
and x{t) and y{t) are continuous functions on the interval 
[— T, T]. We stress once more that our results do not depend 
on the PDS matrix E, and that these results therefore do 
not only hold for equally spaced observations with equal un- 
certainties. On the interval [— T, T] , we define an orthonor- 
mal b asis of quadratic functions (see lvan Haasteren fc LevinI 
I2OIOI . for a similar application): fi{t), f2(t), fsit): 



Mt) 
hit) 




(17) 



These basis functions satisfy {fi,fj)E = Sij. The process of 
fitting for quadratics can now be expressed as a projection 
of the covariance functions in terms of these basis functions: 



C^{to,t3) = S{to,ti)C{ti,t2)S{t2,ta), 



(18) 



^ A useful identity is: 



CU ■■ 



,fTcG 



(G'^CG)-^ G^CF 
pTcF - F^CG {G'^CG)-^ G^CF 



where from here onward, we always take inner-product given 
by Equation over the repeated variables ti and t2, and 
S{tk,ti) is given by 



(19) 



with 5{x) the Dirac delta function. Using this formalism, 
it is possible to analytically derive the projected covariance 
function G^ for TCSSs with any spectral density. 



3.4 Povirer-law covariance function 

Power-law spectra are of particular importance in PTA ap- 
plications, since the stochastic background of gravitational 
waves is expected to be a signal well-described by such a 
power spectral density (iBegelman et al.lll98Cll:[Phinnevll2001 • 
Jaffe fc Backeil l2003l : IWvithe fc Loebl l2003l : ISesana et all 



20081 ). Strictly speaking, a process governed by a power-law 



spectral density is improper, and therefore unphysical. For 
PTA purposes however, the power-law behaviour of the sig- 
nal is expected to hold within its expected frequency band of 
0.1-10 yr~^. To enforce finiteness of the covariance function, 
it is convenient to introduce cut-off frequencies in Equa- 
tion Q; see vHLML. These can be problematic in practice: 
the cut-off frequency needs to be low enough for the signal 
to represent a power-law signal, yet it must be high enough 
for its numerical representation not to cause numerical ar- 
tifacts due to limited machine precision. In this section we 
show that by choosing the projection matrix to represent 
a fitting procedure that includes quadratic spindown, the 
dependence of the covariance matrix on the low cut-off fre- 
quency is explicitly removed. 

We parametrise the spectral density as 



S{f) = 



1 



lyr 



f 



lyr- 



(20) 



with A the amplitude of the signal (units time^), and 7 is 
the spectral index. We require a low frequency cut off fh 
if 7 ^ 1. As shown in vHLML, in that case the covariance 
function is equal to: 



V fL 



{r(l-7)sin(^)(/Lr,,)--^ 



(2n)! (2n + 1 - 7) 



(21) 



where nj — 2n\ti — tj\. In vHLML it is shown that the re- 
moval of quadratic spindown from the timing-residuals also 
completely removes any dependencies on the low-frequency 
cut off fL ■ In the numerical calculations, we need to choose 
fL ^ so that C^^ is PDS, and so that we can neglect 
the terms with n ^ 2 in the summation of Equation (|2H) . 
In practice, the diverging terms dependent on the cut-off 
frequency can result in numerical artifacts due to limited 
machine precision. 

We apply Equation (|18|l to the covariance function of 
a power-law signal G^^ of Equation (|21|l to obtain the pro- 
jected covariance function (to, t^) for a power-law signal. 
We present the details of this calculation and the explicit 
formulae in Appendix [X] One of the key results of this cal- 
culation is that dependencies on the low-frequency cut-off /l 
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are removed from the infinite summation of Equation pip 
up to n = 2. This ensures that the quadratic spindown fits 
have completely removed the sensitivity to /z, up to 7 < 7. 
In previous studies (vHLML) it was thought that this was 
only true for 7 < 5. 

Now that we have an analytic expression for C^(to,i3) 
for a process with the power-law spectral density of Equa- 
tion (|20|l , it is possible to derive an expression for the average 
rms in the post-fit timing-residuals, valid for 1 < 7 < 7: 

fT 



Covariance function lor a power-law signal (f^ - 0.05 yr"^) 



2 



1 

2T 



dtC'^(t,t) 



3(5-7)(7-3)2^(27r)^-- 
7(H-7)(3-f 7)(5 + 7) 



(22) 



A^r(i 



-7) sin 



l-7rp7-l 



lyr^--T 



From this expression, we can derive an estimate for the rms 
generated by a GWB signal of the form: 



hc{f) 



S{f) = 



Ah 



127r2 



lyr- 



lyr' 



-2/3 



(23) 



lyr- 



where Ah is the dimensionless amplitude of the GWB char- 
acteristic strain he- This results in an estimate for the rms 
of a GWB; 



2 _ /l2 
""GWB — ^h 



9^7r3r(-f ; 
1001 



yr-3T~ 



(24) 



As the covariance function is a function of two variables, 
it is instructive to inspect the results of this section visually. 
First consider the pre-fit covariance function C^^ of Equa- 
tion (|21|l . This is a function of only the difference between 
the two parameters |ti — i2|, since it describes a stationary 
random process. This is illustrated in Figure [TJ where lines 
of equal covariance are lines of equal |ti — t2|. Secondly, we 
demonstrate what the effect of fitting for quadratics is on 
the covariance function in Figure [21 The symmetry due to 
the time-stationarity of the random process that is present 
in Figure [1] is broken, and we see prominent cubical fea- 
tures at the edges of the plot. Thirdly, we have included 
the effect of fitting to the entire timing-model of an exam- 
ple pulsar in Figure |3l This figure is very similar to Figure 
[51 except that there are some small-scale features on top of 
the general structure. This demonstrates that the quadratic 
spindown fitting is the effect that most prominently affects 
the covariance matrix; all other contributions are minor in 
comparison. 

Because the unweighted least-squares fit is unlikely to 
be optimal for any realistic dataset, it may seem that Fig- 
ures [2^31 do not represent a post-fit TCSS if a more appro- 
priate fitting procedure is used (e.g. the Cholesky method). 
One may therefore argue that Equation (|24|l is not a good 
measure of the amount of detectable GWB signal in the 
data. However, the equivalence of C and WCW^ in Equa- 
tion (|10p and Equation (|15p shows that the extra rms in the 
post-fit timing-residuals that may result from an improved 
fitting routine does not increase the sensitivity to a TCSS, 
since the marginalised posterior distribution for the TCSS 
parameters is always the same. The GWB rms quoted in 
Equation (|15p is the part of the TCSS that cannot be ab- 




eiyr) 



Figure 1. The covariance function C{t\,t2) of a power-law spec- 
trum of the form: S{f) = (A^/lyr"!) X (f/yr-l)-i3/3, with 
A = 2.9ns. For comparison, this is equivalent to a gravitational- 
wave background with Ah = 10^^^. Here the effect of fitting has 
been neglected. Notice that the covariance function only depends 
on the value ti — t2- 



Covariance function for a power-law signal witii qsd fitting 




t2 (yr) 



Figure 2. The covariance function C(ti, t2) of the same TCSS as 
in Figurc[T] Here the cff'cct of fitting for quadratics has been taken 
into account analytically with Equation ||A7| | of Appcndix[Al No- 
tice that, in contrast to Figure[T] the covariance function not only 
depends on the value t\ — t2, but on both ti, and t2. 



sorbed in the timing-model: the averaged trace of any post- 
fit covariance function is greater than or equal to this value. 
The quoted GWB rms is therefore a measure of how much 
"detectable" signal is in the data. 



4 TIMING-MODEL ANALYSIS 

When doing an MCMC, we analytically marginalise over the 
timing-model parameters. We would like to retain the infor- 
mation about the timing-model parameters, without adding 
these dimensions to the MCMC. In this section we show how 
to do that efficiently. 

The marginalised posterior of Equation (|15p allows one 
to numerically marginalise over all the stochastic parame- 
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Covarrance function for a power-law signal with timing model fitting 




stochastic parameters. 

2) Arithmetically averaging the Gaussian timing-model pos- 
teriors from each point of the chain. 

The full timing-model posterior distribution can then be 
used to obtain marginalised posterior distributions for any 
combination of timing-model parameters. Importantly, as 
will be demonstrated shortly, all this is done without any 
extra significant computational or memory cost on top of 
what is already used in building a chain in the stochastic 
parameter space. 

Operationally, as in vHLML, the MCMC is performed 
using Equation (|15[) . where we analytically marginalise over 
the timing-model parameters. Ifowever, at each step of the 
Markov Chain, we save the following quantities: 



Figure 3. The covariancc function C(ti, t2) of the same TCSS as 
in Figurc[T] Here the effect of fitting for the whole timing-model of 
J1640-I-2224 has been taken into account numerically. Notice that, 
in contrast to Figure [T] the covariance function not only depends 
on the value ti — t2, but on both ti, a nd t2. We use the ti ming- 
model parameters from the literature jLohmer et al.ll2005l ). and 
the timing-model as used in Tempo2, which includes the parame- 
ters: position (right ascension & declination), quadratic spindown, 
proper motion (right ascension & declination), eccentricity, and 
Al. 



ters of the model, while analytically marginalising over the 
timing-model parameters with a flat prior. Using that equa- 
tion it is impossible to do inference of the timing-model pa- 
rameters. Here we show what extra steps need to be taken 
in order to infer the timing-model parameters. We rewrite 
the likelihood of Equation ((5| as follows, using the same 
notation as in Equation (|i5p : 



Pirn, 



^(27r)"detC 



(25) 



X exp (^-^St^G (g'^CG^ ' G'^Stj , 

where x = {M'^ C~'^ A'ly^ M'^ C'^St, E^^ = M'^C~^M, 
and as before, the stochastic parameters are stored in the l- 
dimensional vector (f). In Appendix|B]we show how to include 
timing-model parameters with Gaussian priors in a similar 
manner. We are interested in recovering the timing-model 
parameters 



(26) 



where the ii denote the basis vectors of the timing-model 
parameters space. 

The main idea is that in Equation (|25|) . which is based 
on a linear approximation to the timing-model, the likeli- 
hood function is a multivariable Gaussian with respect to 
the timing-model parameters. The full posterior distribu- 
tion can be reconstructed without numerically exploring the 
timing-model parameters by: 

1) Constructing a Markov Chain, using the vHLML poste- 
rior distribution of Equation (|15|) that faithfully samples the 



This does not require any additional calculations in the 
MCMC, and for each MCMC step the amount of data that 
has to be saved is of the order m^, which is not expected 
to be a bottleneck in terms of storage space on modern 
workstations. We store these quantities for each step of the 
Markov Chain, which has been run in the /-dimensional pa- 
rameter space of (j), and we thus have enough information to 
fully characterise the (Z-l-m)-dimensional posterior distribu- 
tion function. Just as in vHLML, the marginalised posterior 
for the stochastic parameters can be calculated as usual 
from the MCMC. 

We assume here that we are interested in calculat- 
ing the 2-dimensional marginalised posterior as a func- 
tion two timing-model parameters, say and with 
1 ^ k,l ^ m, but the generalisation to a dimensionality 
other than two is straightforward. The evaluation of the 2- 
dimensional marginalised posterior consists of numerically 
integrating over the stochastic parameters (summing over 
the MCMC samples), and analytically integrating over all 
but two timing-model parameters along the lines of Equa- 
tion (|15|l . Details of this calculation are given in AppendixiBl 
here we give the result: 



exp ( ^ A^Lg {LIT.Lg ) ' Ll 



{2-kY det (L'^^Lg) 

(27) 

where we use (...) to average over all MCMC samples, the 
(m X 2) matrix La = (es, ei), with Ci the i-th basis vector 
for R™, and 



?! - XI 



(28) 



Equation p7p allows one to correctly infer the parameters 
of the timing-model, while taking into account the effect of 
red timing noise. 



5 TESTS ON AN ENSEMBLE OF MOCK 
DATASETS 

We test the procedures we describe in this work with mock 
TOAs. The TOAs are simulated observations of a pulsar 
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with known timing-model parameters, and added noise with 
known statistical parameters. In previous studies, analysing 
just one datase t with an MCMC was a co mputational chal- 
lenge fvHLML. Ivan Haasteren et al ] |201ll '). Extensive statis- 
tical studies of the behaviour of the data analysis method 
have therefore not been carried out in those studies. In this 
section we show that the Bayesian data analysis method 
has the desired statistical properties by introducing and 
applying a method with which a whole ensemble of mock 
datasets can be analysed simultaneously without much com- 
putational overhead. 



5.1 MCMC and importance sampling 

Doing a full analysis of a single dataset is a computationally 
challenging task because we have to do non-trivial matrix 
algebra at each step of the MCMC. This makes a straightfor- 
ward analysis of a whole ensemble of datasets, say k — 1000 
datasets, computationally prohibitive. We seek to overcome 
this problem by analysing a whole ensemble of datasets si- 
multaneously when doing only one MCMC simulation. At 
each sample of the chain we efficiently evaluate the likeli- 
hood for each dataset, which in the end can be used to con- 
struct the respective marginalised posterior distributions. 

A necessary requirement for doing this, is that Equa- 
tion H25p can be evaluated for each dataset without re-doing 
all the matrix algebra. This is possible if all datasets are 
different realisations of the same process, which is not a re- 
striction for the purposes of this work. To ensure realistic 
simulations, we model our moc k data after the data f or pu l- 
sar J1713-I-0747 as published in lvan Haasteren et al.l (|201ll ). 
This model for our datasets has irregular sampling, greatly 
varying error bars for different TOAs, and an unknown jump 
in the middle of the dataset. The n TOAs of each dataset 
are generated as perfect realisations of the published timing- 
model, observed at the same MJDs, combined with a TCSS 
modelled as a random Gaussian process with a red spectral 
density and a flat high frequency tail. 

We simulate the contributions of the random Gaus- 
sian process to the TOAs by, for each dataset, appropri- 
ately transforming a vector of pseudo-random numbers ^ 
with entries drawn from a normal distribution with mean 0, 
and width 1. The simulated timing-residuals are then con- 
structed as St = L(, with L the lower diagonal Cholesky 
decomposition of the covariance matrix C of the random 
Gaussian process, defined by C = LL^ . We generate all 
datasets in the ensemble this way, an example of which is 
shown in Figure [T] All datasets in the ensemble are gener- 
ated with the same input parameters. 

We evaluate the likelihood function for each dataset i, 
and for each MCMC sample j, where i runs from 1 to the 
number of MCMC samples A'^, and j runs from 1 to n. The 
samples at which we evaluate the likelihood values Lij of all 
datasets come from an MCMC chain that we call the kernel 
chain. We postpone the details of how we have constructed 
this kernel chain until the next section, for now we assume 
that we have found a suitable kernel, where for each sample 
we have access to the kernel likelihood Loj, and the val- 
ues of the parameters. For a canonical MCMC simulation, 
producing a marginalised posterior distribution p{9) can be 



calculated as: 



L(e)Po(e)d" 



(29) 



where p{6i) is the marginalised posterior, 6i is the i-th com- 
ponent of the m-dimensional parameter vector 9, L{6) is 
the likelihood function, Po{9) is the prior distribution, and 
{■■■)ei=x indicates an ensemble average over the MCMC 
samples over all samples with i-th parameter equal to 9i. 
This expression assumes that the samples in the MCMC 
are sampled with a probability proportional to L{9)Po{9). 
In our case however, the MCMC samples are taken with a 
probability proportional to the kernel likelihood Lq{6). As 
with importance sampling techniques, we can adjust Equa- 
tion (|29|) to suit this new situation: 



L{e)Po{9) d" 



LoW^(^d" 
Lo{9) 



/ mPo{9) \ 



(30) 



5.2 Choosing a suitable kernel 

With Equation H30|) . we can efficiently produce marginalised 
posterior distributions for many datasets at a time. Provided 
there are enough samples in the MCMC, this expression 
is valid for any kernel likelihood function Lo{9). However, 
as with importance sampling, the efficiency of the MCMC 
is highly dependent on the choice of the kernel likelihood 
function, with it only being practical if the kernel likelihood 
function is very similar to the likelihood functions of the 
datasets. Our datasets satisfy that condition, because they 
are all realisations of the same processes. We therefore take 
the following approach to the construction of a suitable ker- 
nel likelihood function. 

1) We produce a realisation of data, which we call the ker- 
nel set, in an identical manner to how we produced the k 
datasets. 

2) We randomly delete 4/5 of the data points in the kernel 
set to ensure that the kernel set has a broader posterior dis- 
tribution for all parameters than the mock datasets. 

3) The likelihood function that belongs to the kernel set is 
the kernel likelihood. 

4) We make sure that for all the parameters that vary during 
the MCMC, that the true value of each parameter is inside 
the 1-(T region of the likelihood. If not, start at 1) again. 
By constructing a kernel likelihood like this, we are ensured 
that our kernel likelihood distribution covers all the high 
probability density (HPD) regions of all the likelihood func- 
tions. 



5.3 Statistical properties of the ensemble 

We use the method outlined above to test k = 1000 datasets. 
The random Gaussian process is a summation of several 
components: 

1) the error bars of th e individual data points of 
J1713-I-0747, as described in Ivan Haasteren et aTl (120111 ). 



8 van Haasteren and Levin 



Joint (Np7,) distribution 



Maximum likelihood value of (Npy^) for all datasets 




Metropolis 1 a 
Metropolis 2 a 
Metropolis 3 a 
Ensemble 1 a 
Ensemble 2 a 
Ensemble 3 a 
Kernel 1 a 
Kernel 2 a 
Kernel 3 o 



200 300 400 500 

Red noise amplitude (ns) 




200 300 400 

Red noise amplitude (ns) 



Figure 4. Analysis of the TOAs of Figure [7] with two meth- 
ods: a regular MCMC, and the importance-resampling method 
of Equation The MCMC contours are marked "Metropo- 

lis", and the importance-resampled contours are marked "En- 
semble". Also, the contours of the kernel set that has been used 
are shown, marked by "Kernel". In all cased, the 68%, 95%, and 
99.9% contours arc shown. The true values for this simulation 
were: Nr = 145ns and ■yr = 5.4. 

2) an extra component of noise that is added in quadrature 
to all error bars. This parameter is the same for all data 
points, and represents the pulse jitter (EQUAD). 

3) a red timing- noise TCSS, described by a power- law spec- 
trum of the form S{f) = Ar^^(l/lyr-i)(f/lyr-^)-^', with 
Nr the noise amplitude, and 7r the spectral index that de- 
scribes the "redness" of the timing-noise. 

For each dataset, we have three parameters that vary during 
the MCMC, and we have 12 timing-model parameters that 
we analytically deal with during the MCMC. We show the 
marginalised posterior distribution of the red timing noise 
parameters here as an example in Figure[4l Our re- weighting 
scheme of Equation (|30|) has modified the kernel likelihood 
correctly to match the true posterior distribution. The sam- 
ple in the MCMC chain that has the highest likelihood value 
is a very good estimator for the maximum likelihood in an 
MCMC with this few dimensions. In Figure [5] we present 
the maximum likelihood estimators for the red timing noise 
parameters that we obtain in this way. The collection of 
estimators of the ensemble seem to display the same char- 
acteristics as the marginalised posteriors. 

The ensemble analysis has resulted in fc = 1000 distribu- 
tion functions in 15 dimensions. In order to keep our presen- 
tation of the results transparent, we restrict our discussion 
to the 15k one-dimensional marginalised posterior distribu- 
tion functions that follow from this analysis. For each of 
the 15k distributions, we have access to the true value that 
we gave as an input to our simulations. A very basic check 
would be to verify that for 68% of the datasets, the true 
value lies within the inner 68% of the marginalised poste- 
rior distribution. We generalise that type of basic check to 
a more extensive test of both the width and the shape of all 
the 15k distributions. 

Provided that our model is correct, the posterior distri- 
bution gives the probability that the true value of a param- 
eter has a certain value. Since we have done many trials. 



Figure 5. The maximum likelihood values for the parameters Nr 
and 7r, for the k = 1000 datasets of Section 15.31 The maximum 
likelihood values are taken to be the values of the parameters of 
the MCMC sample with the highest likelihood. This collection 
of estimators seems to display the same characteristics as the 
marginalised posterior of Figure|4] As in Figure|4] the true values 
for this simulation were: Nr = 145ns and jr = 5.4. 

we can count how many times the true value 9Y^° of pa- 
rameter 6i lies within the most-likely x% of the posterior 
distribution. By definition of the posterior, for large number 
of trials this number approaches x% of the total number of 
trials. More formally, we define the inner high-probability 
region (HPR) of the one-dimensional marginalised posterior 
as: 

jp{6,)A6, = a, 
w 

W = {6li e R : P(eO > La} , (31) 

where La is some value > unique for each a, where a 
is a probability with ^ a ^ 1. For each parameter, we 
define a threshold value Lt = P(6''^"°). The true value of the 
parameter lies within the HPR of the marginalised posterior 
distribution when L% > La- By definition of the posterior 
distribution, the probability that the true value lies within 
the HPR is given by Pr(I/t > La) = a, where we use Pr 
to denote probabilit ies. We defi ne the empirical distribution 
function (EDF) as (|Vaartll200ol '): 

fc 

FM^^Y.®^^'-^-"!^ (32) 

where <d(x) is the Heaviside function, here used as an indi- 
cator. The term in the summation of Equation (|32p is the 
indicator for the event Lt > La- For a fixed La, this is a 
Bernoulli random variable with probability a. Hence, Fi^k{a) 
is a binomial random variable with mean a, and variance 
o(l — a)/k- Therefore, by the law of large numbers: 

lim Fi^k (a) = a- (33) 

k — s-oo 

The Glivenko-Cantelli theorem (IClivenkd Il933l : ICantellil 
Il933h states that this convergence happens uniformly over 
a. In Figure|S]we present the empirical distribution function 
for all timing-model parameters in our simulation. 

We compare our EDF Fi^k{a) to the EDF as used in the 
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Empirical distribution functions for Bayesian analysis 
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Figure 6. Empirical distribution function Fi j.(a) — a for all pa- 
rameters i, with k = 1000, for the Bayesian analysis. We used 
mock data of J1713-I-0747 of Section [5]3] with red noise mod- 
elled with a power-law spectral density. The Kolmogorov-Smirnov 
boundaries with significance level a = 0.01 are displayed as the 
b_) lines. 



Kolmogorov-Smirnov (K-S) test in statistics, where one can 
test for the equality of a sampled distribution function to a 
reference distribution function. Given a number of samples, 
the K-S test statistic quantifies how much the distribution 
of the samples, and the reference distribution are alike. Al- 
though we have not one single, but many reference distribu- 
tions, we can define a similar K-S statistic for our EDF: 



= sup j-Fi,fc(a) - a\ 



(34) 



This K-S statistic is our quantitative test whether or not 
our data analysis method is correct. In the canonical appli- 
cation of the K-S statistic, one chooses as threshold for the 
quantity \/fc-Di,fc, which is expected to follow a Kolmogorov 
distribution Pk{K): 



(35) 



where our significance a is determined by Pk{K ^ Ka) = 
1 — Q. Two commonly used values are: a = 0.05 with 
Ka — 1.36, and a = 0.01 with Ka = 1.63. We choose our 
significance level as a = 0.01, which together with k — 1000 
implies that we should reject the null-hypothesis that our 
analysis method is correct when Di^k > 0.052. We can see in 
Figure [6] that we do not need to reject the null- hypothesis. 



6 COMPARISON WITH THE CHOLESKY 
METHOD 

Recently, CHCMV have proposed a new method to deal with 
TCSSs in pulsar timing observations: the Cholesky method. 
The Cholesky method describes the problem of fitting to 
the timing model as a whitening problem, where both the 
data and the description of the timing model need to be 
whitened with a Cholesky decomposition matrix. This ap- 
proach is identical to a GLS fit to the timing model given by 
Equation ([6|. This requires prior knowledge of the covari- 
ance matrix C, which CHCMV substitute with a best esti- 
mator for the power spectral density of the timing-residuals. 



produced with an advanced spectral analysis method. This 
spectral analysis method is implemented in the form of a 
Tempo2 plugin called spectralModel. 

The algorithm implemented in spectralModel allows de- 
termination of the power spectral density of the post-fit 
timing-residuals, assuming that the power spectral density 
has some specific form. For steep red TCSSs as used in this 
work, the power spectral density is modelled as a power-law 
with a so-called corner frequency fc'. 



Pif) 



A' 



(36) 



where A is the amplitude of the TCSS, / is the frequency, 
and Q is the spectral index. The difference with a pure 
power-law as we use in Equation H20|l . is that this power 
spectral density does not diverge for / — > 0. The user needs 
to provide estimates for a and fc and check that these are 
correct; fitting to the data is only done for the amplitude A. 

A direct comparison between the Cholesky method of 
CHCMV and the inference of timing-model parameters with 
a Bayesian analysis as done in this work can be made. Both 
methods take into account the fact that the TO As may con- 
tain a TCSS with a red power spectrum, of which estimates 
can be obtained. And both methods obtain improved es- 
timates of the timing-model parameters due to the incor- 
poration of the TCSS contribution to the TOAs. However, 
several important differences should be highlightec0. 

Firstly, the modelling of the observations is different. 
We model the TCSS as a stationary random Gaussian pro- 
cess that is added to the TOAs, prior to the fitting pro- 
cedure. In the Cholesky method, the TCSS is modelled as 
a stationary signal in the post-fit timing-residuals. As we 
have shown in Figure [T]|31 this stationarity breaks down in 
the fitting process. We believe that this raises a question 
about the spectral estimation method of CHCMV, since the 
post-fit timing-residuals cannot be described by a stationary 
TCSS with a mathematically defined spectral density. 

Secondly, CHCMV do not fully account for the covari- 
ance between the TCSS and the timing-model parameters. 
The use of an optimal spectral estimate in a parameter es- 
timation technique analogous to Equation ([SJ is not com- 
pletely appropriate: the covariance matrix of the TCSS is 
itself covariant with the timing-model parameters, which re- 
sults in an incorrect covariance matrix for the timing-model 
parameter estimates, and incorrect uncertainties in the spec- 
tral estimates. CHCMV show that the incorrectness of the 
uncertainties is significant for the quadratic spindown pa- 
rameters, while it is less of a problem for all the other timing- 
model parameters. 

In Figure [7] we present one realisation of mock data of 
the ensemble of datasets we generated for J1713+0747. Be- 
sides the true residuals as generated by the random Gaussian 
process, we also present three reconstructions of the timing- 
residuals: 

1) The input timing-residuals to all analysis methods. These 



^ We emphasise that we only refer to the theoretical description 
of CHCMV. This does not include the practical implementation 
of the Cholesky method, spectralModel, which in itself has very 
useful general features such as robust spectral estimation. 
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Reconstruction of mock residuals of J1713+0747 as used in thie ensembie 
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Figure 7. Example of the mock timing-residuals analysed in the 
ensemble, and their reconstruction with various algorithms, all 
offset from each other for clarity. The mock timing residuals are 
based on th e observing scheme a n d tim ing-model of J1713-f 0747 
as used in Ivan Haasteren et al. I 1I2OIII') . The error bars in the 
figure are mostly too small to see in this figure, and they vary be- 
tween different observations. The TCSS in these residuals comes 
from a source with the following spectral density components: 

1) The error bars of the individual observations. 

2) A white noise component describing the pulse jitter (EQUAD), 
with rms 200ns. 

3) A power-law red noise component S{f) = 
Af^(l/lyr-i)(f/lyr-i)-T', with amplitude Nr = 145ns, 
and 7r = 5.4. 

In the figure, four reconstructions of the same realisation are 
shown: 

True: The true timing-residuals as generated by the TCSS. 
ML: The timing-residuals as reconstructed using the maximum 
likelihood values for all parameters: both stochastic parameters 
and timing-model parameters. 

Cholesky: The timing-residuals, reconstructed using a covariance 
matrix produced with the Tempo2 plugin "spectralModel" , 
which is an implementation of the Cholesky method of CHCMV. 
Input: The timing-residuals as produced by Tcmpo2 after a 
normal weighted least-squares fit. This "Input" set is used as the 
input timing-residuals for all methods. 

In the ML and Cholesky reconstructions, we have marked the 
true timing-residuals as a dashed line, and we have marked the 
pulse period l-tr boundaries with a solid line. These solid lines 
demonstrate what the residuals would look like if we changed 
the pulse period, FO, by ± l-o", and therefore give an impression 
of how well this parameter is determined from the data. 

were not the true timing-residual^, but the timing-residuals 
after a weighted least-squares fit was subtracted from the 
timing-residuals with Tempo2 

2) Cholesky timing-residuals. We used the spectralModel 



We actually worked with TOAs. The residuals plotted in Fig- 
urc[7|arc produced using different Tempo2 ".par" files. 
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Figure 8. Similar plot as Figure (6] but now for a method that 
combines a generalised least-squares fit with a maximum likeli- 
hood spectral estimator (similar to the Cholesky method). The 
same mock datasets as in Figure [6] are used. We see that such a 
method performs well, with results similar to the Bayesian analy- 
sis. Only the quadratic spindown parameters (the average (offset), 
the pulse period FO, and the period derivative Fl) are slightly out- 
side the K-S boundaries. This means, assuming Gaussian proba- 
bility distributions, that the rms of the parameter estimates was 
at least 1.11 times the estimated uncertainty. 



plugin for Tempo2 to produce an estimate for the covari- 
ance matrix of the post-fit timing-residuals. The Cholesky 
timing-residuals are constructed using that estimate and 
Equation (|6]). 

3) ML timing-residuals. We used the maximum likelihood of 
Equation (|15p for the stochastic parameters to produce a 
best estimator for the covariance matrix C. This results in 
the maximum likelihood estimator timing-residuals through 
Equation i^. 

One can see that the maximum likelihood timing- 
residuals approximate the true timing-residuals slightly bet- 
ter than the Cholesky timing-residuals: the Cholesky timing- 
residuals deviate slightly more at the sides, with the true 
residuals not everywhere inside of the l-cr bound of FO, in- 
dicating an error in the low- frequency behaviour. The 1- 
(7 bounds of FO also seem to be smaller than those for 
the maximum likelihood timing-residuals. These statements 
were generally true for all the realisations of the ensemble 
of mock datasets. Besides due to the issues raised above, 
this may also be due to the difference in modelling of the 
power spectral density: the true timing-residuals have been 
generated with a strict power-law, and a very low-frequency 
cut-off. However, the results here seem to be consistent with 
Table 4 of CHCMV. 

We would like to perform a K-S test on the results of 
the Cholesky method to check for consistency. However, this 
comparison on the ensemble of datasets presented in this 
work would not be fair because the spectral model used by 
the spectralModel plugin would then be incorrect. Also, we 
believe that some of the issues with the Cholesky method 
that we raised above can be overcome. We therefore per- 
form a K-S test on the maximum likelihood equivalent of 
the Cholesky method: we use Equation (|25l) in conjunction 
with the maximum likelihood of Equation (|15p as an esti- 



time- correlated signals in pulsar timing 11 



mator for the covariance matrix C. This is equivalent to 
the Cholesky method when using an "optimal" estimate for 
C. Because this estimator does take into account the non- 
stationary nature of the post-fit residuals, this is effectively 
a limit on how well any whitening method can perform. Ap- 
plying this method to the same ensemble of mock data as in 
Section \5 . 31 vields Figure [S] We see that the quadratic spin- 
down parameters are slightly rejected by the K-S test, which 
shows that at least some bias in the quadratic spindown pa- 
rameters of any whitening method is due to the covariance 
of the TCSS with the quadratic spindown parameters. The 
width of all the posterior distributions was similar between 
the approach of Figure [6] and Figure [S] 

For the quadratic spindown parameters to be rejected 
by a K-S test of this magnitude means that, assuming Gaus- 
sian probability distribution functions, the rms of the pa- 
rameter estimates was at least a factor of 1.11 times larger 
than the estimated uncertainty. Table 4 of CHCMV shows 
that their estimates for FO and Fl were over a factor of three 
too large, which corresponds to sup^ \Fi±{a) — a| > 0.43. 
With the 100 realisations of mock data they used, the K- 
S bound would be 0.16. This would be a firm rejection. We 
therefore agree with CHCMV that the Cholesky method can 
be further improved to give more reliable spectral estimates 
at the very low frequencies. 



7 CONCLUSIONS 

We investigate time-correlated stochastic signals (TCSSs) 
in pulsar timing data analysis. TCSSs are significantly in- 
fluenced by fitting procedures that solve for timing-model 
parameters, and timing-model parameters estimates can be 
biased due to absorption of of power the TCSS. We formally 
analyse the covariance between the timing model and TC- 
SSs, and obtain closed expressions describing the behaviour 
of the TCSSs when fitting to the timing-model. New results 
we derive in our analysis; 

1) Proof that the results of the Bayesian analysis are unaf- 
fected by use of different fitting methods (e.g. (un)weighted 
least-squares), provided that the timing solution has con- 
verged. 

2) Closed expressions for the post-fit correlations of signals 
with known power spectra. 

3) Analytical closed expressions for the post-fit covariance 
function of power-law signals with quadratic spindown fit- 
ting. This includes proof that the low-frequency cut-off is 
removed up to spectral indices up to 7 = 7, corresponding 
to a = 3 for the GWB. 

4) More computationally efficient expressions for the 
marginalised posterior distribution of vHLML. 

5) An analytical expression of the post-fit rms induced by a 
stochastic gravitational- wave background. 

6) Equations on how to extract the timing-model parame- 
ters from Bayesian MCMC simulations. 

7) A new method to analyse hundreds of mock datasets si- 
multaneously with a Bayesian analysis, without significant 
computational overhead. 

8) A powerful test to check whether any data anal- 
ysis method produces consistent results, based on the 
Kolmogorov-Smirnov test. 

We test our method on many realisations of mock data. 



and find that the shape, width, and position of the pos- 
terior distributions are consistent with the input values of 
the parameters. We compare our results to methods that 
use a spectral esti mate to whiten the timing-residuals, like 
IColes et al.1 (|201lf ). and find that an optimal whitening 
method performs equally well as our own method, except 
for the quadratic spindown parameters, in which case the 
Bayesian analysis produces more consistent results. 
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APPENDIX A: POWER-LAW COVARIANCE 
FUNCTIONS 

In this Appendix, we analytically derive the post-fit covari- 
ance function WC^^W^ from Equation ([TDJ of the mam 
text. We rewrite the relevant expressions for the basis- 
functions and the projection operators here for convenience, 
with the same notation as in Section [3.31 

fT 

x{t)y{t) dt 



h(t) 
hit) 



1 



a^At 
1 



(Al) 



T 

At 



7!"V T 



I At t 



'45 At 
-a\ — 

r 



= SitoMC^ {tlMS{t2,t3) 
3 



hereafter we always sum over the repeated indices ti and 
t2- Because the pre-fit covariance function of a power- law 
spectral density depends only on r = 2%\to — t-i\, we first 
calculate the following quantity: 

Zl (to,t3) = S{to,ti) |ti - tal^ S{t2,t3). (A2) 

We can then construct with several termfl We write 
the resulting Z^ in the following terms: 

3 3 
Zc {t0,t3) = \to-t3\'^ -J2Z^i^0,t3)+ J2 Zij{to,t3). 

i=l i,j=l 

(A3) 

The Zij terms are symmetric in i and j, and after evaluation 
of the (somewhat tedious) integrals we fincQ: 



Zij (to, ^3) 

Ull {to,t3) 
U12 {to,t3) 

Ul3 {to,t3) 

U22 {to,t3) 
U23 (to, t3) 

U33 {toM) 



[/.(tl)|tl-t2|^/,(t2)]/;(to)/,(t3) 



2(1 + 0(2 + 




15C((^)^ + (»)^-|) 

4(2 + 0(3 + 0(4 + 

9C^ 

2(1 + 0(2 + 0(4 + 



225C(C-2)((M)^_i)((^ 



(A4) 



2 _ 1 

T I 3 



8(1 + 0(2 + 0(4 + 0(6 + 



where we have used: 



(2T)^+C Hi— J 



We found the Zi terms to be: 



Z^{ta,t3) = f^(to)f^{tl)\tl-t3[^ 



(A6) 



Z\ (to, ts) = 



Z3 (io, ^3) 



+ 



Z2 (to, ta) = 



+ 



+ 



+ 



+ 



lfo-t2|«.A(t2)/,(t3) 

(r + to)^+^ + (r-to)^+^ 

2r(i + o 
(r + t3)^+^ + (r-t3)'^+^ 

2T(l + 
3 (-ii (T + to)^+' + ^ (T - to)«+') 

2r(i + o 

3 (-^ (T + t3)^+' + |i (T - t3f+^) 

2r(i + o 

3(f (r + t3)«+'-f (r-t3)«+2) 

2T2(1 + 0(2 + 
3(^(r + to)«+'-^(r-to)«+') 

272(1 + 0(2 + 



+ 



+ 



15 ( 




-l) ((T + t3)^+' + (r- 
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15 ( 
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-|) ((T + to)'-"+' + (T- 










4r(i + o 






45 ( 
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-I) ((r + t3)^+' + (T- 
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4T2(1 + 0(2 + 






45 ( 


iW 


- 1) ((r + to)^+' + (T- 




h2\ 






4r2(i + 0(2 + 






45 ( 


iW 


- 1) ((T + t3)«+' + (T- 




h3\ 






4^^(1 + 0(2 + 0(3 + 






45 ( 




- i) ((r + to)«+' + (r- 





4r3(i + 0(2 + 0(3 + 



Then can be obtained by substituting the above expres- 
sions into the following: 



1 \ 7-1 



V u 

E(-i; 



{r(l-7)sin(^) {h2nr-'Z 



p 

7-1 



(2n)!(2n + l-7)'"2" ^"^^^ 

Interestingly, Zq = Z2 = Z^ = 0. This means that for 
7 < 7, all the /l dependent terms in vanish due to the 
removal of quadratic spindown. 

In the calculations of the rms of signals, we also need 
the following integral, valid for > 0: 



i,r^aM)dt^ J(^;;,o(c^;)2;;;-rc^^ ^^^^ 



2r 



(1 + 0(2 + 0(4 + 0(6 + 



^ Wc note that in general Cij = \ti—tj\^ is not a PDS matrix, and 
it therefore docs not correspond to a physical stochastic process. 
^ The calculations are available from the authors by request. 



This result does not contradict Zq — 0, since for <^ = 
the integral does not exist: for 7 ^ 1 we also need a high- 
frequency cut-off for the power spectral density. 
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APPENDIX B: GAUSSIAN PRIORS AND 
TIMING MODEL ANALYSIS 

In this Appendix we show how to deal with Gaussian pri- 
ors for the timing model parameters efficiently, and how to 
derive Equation (f27)) . 



Bl Gaussian priors 

Besides with fiat priors, analytically marginalising over 
timing-model parameters is also possible with Gaussian pri- 
ors for the the timing-model parameters. We define Gaussian 
priors for the m timing-model parameters ^ as: 



..(f) 



exp 



\/(27r)'"det Eq 



(Bl) 



where are the maxima of the prior probabilities, and Eq 
is the (m x m) prior covariance matrix of the timing model 
parameters. We now proceed with Equation (O multiplied 
with this prior, and rewrite this analogous to what we did 
in Equation pSj) : 



exp [st^C-^St + x^S-^x + CIS-^CJ. 



exp 



^(f-x) 



(271)" + ™ 
T 



det Eo det C 

T' 



-1 



(B2) 



with 
X 



(^A'FC-^M + Eo ^) ^ (^M'^C'^St + Eo '^"0 



(B3) 



Up to a normalisation constant due to the inclusion of the 
prior, these expressions reduce to Equation (|25p if we take 
E(^^ — 0, and ^0 = 0. Marginalising Equation (|B2|I over the 



timing-model parameters gives 

P (^f\5t) : 



VdetE 



(B4) 



\/(27r)"det Eo detC 
exp [si'^C-'si + x^E-^x + C?So 'Co]) , 



We assume that we would like to obtain the 2- 
dimcnsional marginalised posterior as a function of the pa- 
rameters and ^i, with 1 ^ k,l ^ m, but the generali- 
sation to a different dimensionality is straightforward. The 
2-dimensional marginalised posterior is constructed by in- 
tegrating over all elements of ^, except for ^fc and ^i. This 
integration is analogous to what we did with Equation ((5}- 
(|15l) . We therefore construct two auxiliary matrices similar 
to F and G of Equation (fT^ : 



Lf = (• ■ • Gk-i ek+i 
La = (efc ej) , 



ei-i ei+i 



(B6) 



where the ii are the basis vectors of R™. Similar to Equa- 
tion (|15|l . the 2-dimensional marginalised posterior now be- 
comes; 



where 



exp^AC Lg{LI^Lg) 'LgAC 



(27r)^det {L'^Y.Lg) 



- XI 



(B7) 



(B8) 



B2 Posteriors for the timing-model parameters 

The MCMC samples are drawn from P{(f)\5t) of Equa- 
tion (|B4[) . which is P{(j),^\St) of Equation (|B2[l marginalised 
over ^. However, we are interested in interested in P{^\5t), 
which is P{(j>,^\5t) marginalised over all stochastic parame- 
ters (j). Using an importance sampling approach, we approxi- 
mate the full posterior distribution with the MCMC samples 
as: 




where we use (...) to average over all MCMC samples. 



